Acceleration of short and long DNA read mapping without loss of accuracy using suffix array
نویسندگان
چکیده
UNLABELLED HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. AVAILABILITY AND IMPLEMENTATION https://github.com/opencb/hpg-aligner.
منابع مشابه
Accurate Long Read Mapping using Enhanced Suffix Arrays
With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes. Recent developments in sequencing technology allow longer reads, but the mappers for short reads are not suited for reads of several hundreds of base pairs. We propose an algorithm for mapping longer reads, which is based on chain...
متن کاملCGAP-Align: A High Performance DNA Short Read Alignment Tool
BACKGROUND Next generation sequencing platforms have greatly reduced sequencing costs, leading to the production of unprecedented amounts of sequence data. BWA is one of the most popular alignment tools due to its relatively high accuracy. However, mapping reads using BWA is still the most time consuming step in sequence analysis. Increasing mapping efficiency would allow the community to bette...
متن کاملUtilization of Suffix Array for Quick STD and Its Evaluation on the NTCIR-9 SpokenDoc Task
We propose a technique for detecting keywords quickly from a very large speech database without using a large-sized memory. For acceleration of search and saving the use of memory, we employed a suffix array as a data structure and applied phonemebased DP-matching to it. To avoid exponential explosion of process time with the length of a keyword, a long keyword is divided into short sub-keyword...
متن کاملAccurate Taxonomic Assignment of Short Pyrosequencing Reads
Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment...
متن کاملA Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table
The purpose of DNA sequencing is to determine the order of nucleotides within a DNAmolecule of target. The target DNAmolecules are fragmented into short reads, which are short fixed-length subsequences composed of ‘A’, ‘C’, ‘G’ ‘T’, by next generation sequencing (NGS) machine. To reconstruct the target DNA from the short reads using a reference genome, which is a representative example of a spe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 30 شماره
صفحات -
تاریخ انتشار 2014